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THE INTELLIGIBILITY OF RECTANGULAR SPEECH-WAVES 


By J. C. R. LICKLIDER, DALBIR BINDRA and IRWIN POLLACK, 
Harvard University 


One of the central problems in voice communication is that of deter- 
mining the essential characteristics of speech as a stimulus. With regard 
to intelligibility, one of the principal attributes of speech, the fundamental 
questions are: Upon what characteristics of the speech-wave does intelligi- 
bility depend? Are certain characteristics of the speech-wave of paramount 
importance for intelligibility? Are other characteristics perhaps irrelevant 
insofar as intelligibility is concerned ? 

In attempting to find answers to those questions, it is useful to employ 
a procedure analogous to that based upon surgical lesions and tests of 
performance in the study of brain functions, It is instructive, for example, 
to operate upon the speech-wave—i.e. to distort it—and to determine the 
effect upon intelligibility by comparing pre-operative and post-operative 
articulation scores. In this way, distortion is useful as a tool with which 
to study the nature of speech. 

The present experiments concern the effects upon intelligibility of one 
particular type of distortion which, in its most extreme form, produces 
square speech. The experiments with square speech grew out of a more 
general investigation of the effects of distortion which was undertaken 
during the war.! A part of the earlier work will be described briefly as a 


* Accepted for publication August 24, 1947. This research has been carried out 
under contract with the U.S. Navy, Office of Naval Research (Contract N5ori-76, 
Report PNR-37). 

1 J. P. Egan and F. M. Wiener, On the intelligibility of bands of speech in noise, 
J. Acoust. Soc. Amer., 18, 1946, 435-441; N. B. Gross and J. C. R. Licklider, The 
effects of tilting and clipping upon the intelligibility of speech, Psycho-Acoustic 
Laboratory Report PNR-11, 1946; J. C. R. Licklider, The effects of amplitude dis- 
tortion upon the intelligibility of speech, Psycho-Acoustic Laboratory Report, OSRD 
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background against which the experiments with square speech will be dis- 
cussed in more detail. 
| PEAK CLIPPING 


The most prevalent type of amplitude distortion is the type which is introduced 
when conventional communication circuits are overdriven. In its simplest form, this 
type of distortion is called peak clipping. 

Peak clipping is best visualized in terms of what it does to the wave patterns seen 
on the face of a cathode-ray tube. In Fig. 1, the oscillogram shows the temporal 
course of the pressure variations in the air between a talker and a listener as the 


Fic. 1. OSCILLOGRAMS OF SPEECH-WAVES 
“Joe took father’s shoe-bench out.” 


talker says: Joe took father’s shoe-bench out. The top line is Joe. Fig. 2 shows Joe 
again (at A), this time schematized and enlarged. The air pressure varies rapidly 
and irregularly about its resting value to form the sound j, then somewhat more 
slowly and more regularly to form the sound 6. The wames may be thought of, 
alternatively, as the voltage variations in a telephone circuit. Or, very roughly, the 
upward and downward swings may be regarded as corresponding to the inward 
and outward excursions of the diaphragm of the telephone receiver as it converts 
the electrical waves into sound waves. 

Peak clipping is a very simple process. If we draw dotted lines on either side 
of the center axis (Fig. 2-A) and then clip along the dotted lines, we clip off the 


No. 4217 (PB 19775) 1944. (This and other reports from the Psycho-Acoustic Lab- 
oratory are available through the Office of Technical Services, U.S. Department of 
Commerce, Washington, D.C. It has been summarized in: Effects of amplitude 
distortion upon the intelligibility of speech, J. Acoust. Soc. Amer., 18, 1946, 429 
434.) | 
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peaks of the speech-wave, leaving only the squared-off center portion shown at B. 
The actual operation is performed electronically with a series-diode clipper circuit.’ 

Peak clipping may be more or less severe, depending upon how much of the 
wave is clipped off and how much is left. At B (Fig. 2), the wave is stripped down 
to one-half its original peak-to-peak amplitude. Since a halving of amplitude is a 
reduction of 6 db., this is 6-db. peak clipping. At C, the wave is reduced to one- 
tenth of its original amplitude—20-db. peak clipping. 

With extremely severe peak clipping, there is of course only a small part of 


Fic. 2. DIAGRAM ILLUSTRATING PEAK CLIPPING 


The waves of the word Joe are shown schematically at A. B shows what is left after 
6-db. peak clipping, #.e. after reduction to one-half the original peak-to-peak ampli- 
tude. C illustrates 20-db. peak clipping, ze. reduction to one-tenth the original 
amplitude. At the right, the waves of B and C are shown reamplified until their 
peak-to-peak amplitudes are equal to the peak-to-peak amplitude of A. 


the original wave left to be seen or heard. For that reason it is frequently of in- 
terest to reamplify the clipped wave. At the right-hand side of Fig. 2, the waves 
are shown reamplified until their peak-to-peak amplitudes are equal to that of the 
original wave (A). 


EFFECTS OF PEAK CLIPPING UPON INTELLIGIBILITY 


The initial experiments were conducted to find out how severely intel- 
ligibility is reduced by amounts of peak clipping which might conceivably 
be encountered in practice. Various amounts of peak clipping between 
O and 20 db. were introduced into otherwise high-quality audio com- 
munication circuits, and the intelligibility of discrete words transmitted 


* Gross and Licklider, op. cit., and J. C. R. Licklider and G. A. Roberts. A pre- 
modulation clipper unit for voice communication transmitters, Psycho-Acoustic Lab- 
oratory Report IC-100 (PB 19807), 1945. 
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over the circuits was determined by means of articulation tests.® 

To those of us who had assumed that faithful reproduction of the speech- 
wave was essential for high intelligibility, the results of the initial expert- 
ments were quite surprising. Never did the articulation scores fall below 
96%. Insofar as intelligibility was concerned, communication was essen- 
tially perfect—just as much so with nine-tenths of the speech-wave clipped 
off as with high-fidelity reproduction of the entire wave. 

These tests were made under conditions favorable for communication. 
Since it appeared reasonable to suppose that detrimental effects of peak 
clipping might be more evident under less favorable conditions, further 
experiments were conducted. In some, intense noise was produced at the 
talker’s position, and in those experiments detrimental effects of peak 
clipping did appear. In tests in which the talker was in quiet and the 
listeners were in a noisy location, the effect of peak clipping was even 
more surprising than it had been in the initial tests mentioned above: it 
was sometimes possible to obtain higher intelligibility with peak clipping 
than without it. When the communication equipment had insufficient power 
to pass the high-amplitude parts of the speech-wave without distortion, 
intelligibility was markedly improved by clipping off the peaks of the 
wave and using the available power for the remainder of the wave. 

This observation shifted the emphasis from impairment of intelligibility 
to improvement. It became obvious that speech-waves are, for many pur- 
poses, simply of the wrong shape, and that with the aid of peak clipping 
they can be fitted into a more efficient ‘package.’ Peak clipping was intro- 
duced intentionally, and with considerable benefit, into such items of com- 
munication equipment as radio transmitters and hearing aids.* 

The resistance of intelligibility to impairment by peak clipping offered, 
nevertheless, something of a challenge. An array of clippers and amplifiers 


* Articulation testing procedures are described in detail in J. P. Egan, Articula- 
tion testing methods, Psycho-Acoustic Laboratory Report OSRD No. 3802, (PB 
22848), 1944. A more complete account of the initial experiments with peak clip- 
ping is given in Licklider, op. cét. 

*H. Davis, C. V. Hudgins, R. J. Marquis, R. H. Nichols, Jr., G. E. Peterson, 
D. A. Ross and S. S. Stevens, The selection of hearing aids, Laryngoscope, 56, 1946, 
85-115, 135-153; H. Davis, S. S. Stevens, R. H. Nichols, Jr., C. V. Hudgins, R. J. 
Marquis, G. E. Peterson and D. A. Ross, Hearing Aids: An Experimental Study 
of Design Objectives, 1947, 24-94; K. D. Kryter and M. I. Stein, The advantages 
of clipping the peaks of speech-waves prior to radio transmission, Psycho-Acoustic 
Laboratory Report IC-83 (PB 22859), 1944, (Summarized in: Premodulation clip- 
ping in AM voice communication, J. Acoust. Soc. Amer., 19, 1947, 125-131); 
Licklider and Roberts, op. cit., 11-15; W. W. Smith, Premodulation speech clipping 
and filtering, QST, 30, 1946, 46-50; G. A. Miller and S. Mitchell, Effects of 
distortion on the intelligibility of speech at high altitudes, f. Acoust. Soc. Amer., 
19, 1947, 120-125. 
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was assembled, and articulation-tests were conducted in quiet with 10, 20, 
30, 40, 50..., 100 db. of peak clipping. In each case, the clipped wave was 
reamplified until its peak amplitude was equal to that of the original, 
undistorted wave. The results of these tests are shown in Fig. 3. 

The lower scale of Fig. 3 indicates the amount of peak clipping in 
decibels, and the upper scale shows the fraction of the peak-to-peak ampli- 
tude of the original speech-wave which remained after the peaks were 
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PEAK CLIPPING IN DECIBELS 


Fic. 3. EFFECT OF PEAK CLIPPING UPON THE INTELLIGIBILITY OF SPEECH IN QUIET 


Intelligibility is expressed in terms of the percentage of the test-words heard cor- 

rectly. The lower scale shows the amount of peak clipping in decibels; the upper 

scale indicates the fraction of the peak-to-peak amplitude of the original wave which 

remained after clipping and before reamplification. Each solid circle represents the 

average of the scores made by one listener on four 100-word tests. The open circles 

and triangles shown for 0-db. clipping and infinite clipping represent the scores 
made by two other listeners in tests with more difficult words. 


clipped off. This fraction refers, it is important to note, to the amplitude 
which the squared-off wave would have had if it had not been reamplified. 
The ordinate, “per cent word articulation,” is the percentage of the words 
correctly understood. The curve shows that the intelligibility of monosyl- 
labic words, heard out of context, does fall off somewhat as the amount of 
peak clipping exceeds 20 db., but that beyond 50 db. the curve levels out 
and there is no further decline. The solid circles represent data for only one 
listener (JL), but curves of the same general form have been obtained in 
a number of other experiments with other observers.> That the curve flattens 


* Davis, et. al., Hearing Aids: An Experimental Study of Design Objectives, 1947, 
37 and 44. 
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out as it does is a consequence of the fact that the amount of peak clipping 
is expressed in decibels. All but 1% of the original amplitude range is 
eliminated by 40 db. of clipping, with the result that the speech-wave is 
reduced to very little more than a succession of rectangular waves. It is 
hardly to be expected, therefore, that further clipping and reamplification 
would have any marked effect upon intelligibility. 

The exact level at which the curve flattens out depends upon a number 
of factors, including the skill of the listener in understanding distorted 
speech and the difficulty of the speech material. The open circles and 
triangles in Fig. 3 represent the scores of two observers listening to more 
difficult words. Even with ‘infinite’ clipping, the poorer one heard over 
50% of the words correctly—50% articulation here corresponds approxt- 
mately to 90% sentence intelligibility. Other observations showed that 
infinite peak clipping causes no great difficulty and makes necessary only 
few repeats in ordinary conversation. 


INFINITE PEAK CLIPPING 


' The use of the term ‘infinite’ to describe a physical operation requires a qualify- 
ing note. Continued indefinitely, peak clipping and reamplification would reduce 
speech to a series of perfectly square or rectangular waves. Since the process cannot 
in practice be continued indefinitely, there can be no truly infinite peak clipping, 
but a certain degree of idealization can be achieved by performing the actual con- 
version of normal speech into square speech in two steps. 

First, the speech-wave is subjected to successive peak clipping and reamplifica- 
tion, as illustrated in Fig. 2, until the wave is as completely squared off as a rea- 
sonable number of amplifiers and clippers (100 to 200 db. of peak clipping) can 
make it. Then, the nearly square output of the clippers is passed through a ‘flip-flop’ 
circuit or electronic switch which eliminates any remaining deviations from ideal 
rectangularity.® 

After having been put through this treatment, the words Joe and shoe have the 
wave-forms illustrated in Fig. 4. The output of the electronic switch remains at a 
fixed positive voltage as long as the original speech-wave is above the center axis, or 
at a fixed negative voltage as long as the original wave is below the center axis. When- 
ever the original wave crosses the center axis, the square wave switches up or 
down as fast as it can go. 

Inasmuch as the infinite clipper is sensitive to extremely small variations in the 
input wave, a peculiar situation atises when the talker stops talking. In Fig. 4, 
the square speech-wave is represented, for simplicity, as subsiding to the center axis 
at the end of a word, but actually the output of the infinite clipper continues to 
switch back and forth at full amplitude. These oscillations during the interval be- 


* Except, of course, those which are imposed by the phase and frequency-response 
characteristics of the headphones. In the experiments to be described, high-quality 
dynamic receivers (Permoflux PDR-10) were used. 
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tween words are caused, of course, by the residual noise in the input wave. In the 
tests this residual noise was mainly record scratch, but the effect would not have 
been markedly different in an ideal circuit, because there would be electron- 
fluctuation noise even in an ideal circuit; and squared-off record scratch can hardly 
be distinguished from squared-off fluctuation-noise. In any event, the intense noise 
during the interval between words gave the listeners the feeling that the words 
themselves should have been almost entirely masked. The words were not masked, of 
course, because the noise subsided when the words appeared, but the listeners were 
at first somewhat amazed that they could understand the speech as clearly as they 
did against such a noisy background. 

It is possible to make the square noise that appears between words inaudible. 
This is accomplished by introducing, into the infinite clipper along with the speech 
signal, an ultrasonic signal—a 25 kc. sine wave is convenient—just strong enough 


pe A A a 


Fic. 4. OSCILLOGRAMS ILLUSTRATING INFINITELY CLIPPED SPEECH 


These waves were obtained by passing normal speech waves (Joe and shoe) 

through a series of peak clippers and amplifiers, and then through a ‘flip-flop’ 

circuit. The very rapid oscillations at the beginning of each square-wave pattern 

correspond to the consonant; the more widely spaced oscillations in the middle 
and at the end, to the vowel. 


to override the residual noise of the test-circuit. The ultrasonic signal is not sufh- 
ciently intense to have any effect when speech is present, but it determines the 
output of the infinite clipper during the intervals between words. Since the output 
consists, during these intervals, of 25,000 square waves per second there is no 
energy in the range of audible frequencies, and the listeners hear only silence. 
With this treatment, the square speech sounds better, but it is little if any more 
intelligible. 


INTELLIGIBILITY OF SQUARE SPEECH IN NOISE 


As soon as it became clear that square speech was reasonably intelligible, 
several questions suggested themselves. How is the intelligibility of in- 
finitely clipped speech affected by noise? Do listeners learn to understand 
square speech better with practice? Can we learn anything from square 
speech about the nature of speech as a stimulus ? 

The question of the effect of noise upon the intelligibility of square 
speech arose because it had been observed in earlier work, with less severe 
peak clipping, that noise tends to cover up the distortion products and 
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make clipped speech sound less distorted. It was, of course, not expected 
that noise would actually make square speech more intelligible, but it 
was considered probable that the impairment of intelligibility due to noise 
would be less for square than for normal speech. 

Tests were conducted, therefore, with square speech and normal speech, 
equated in peak amplitude and heard against a background of ‘white’ noise. 
This white noise is to be distinguished from the residual circuit noise which 
entered the infinite clipper with the speech-wave. The white noise was 
introduced intentionally at a point following the infinite clipper. 

The word-lists used in the tests were recorded on acetate disks by two 
talkers using RS 38-A carbon microphones.’ The frequency response of 
the system, including cutter and playback, was uniform within +5 db. 
from 250 to 7000 cps., but fell off rapidly beyond those limits. Either with 
or without clipping, frequencies above 7000 cps. are quite unimportant so 
far as intelligibility is concerned. It is important, however, that the very- 
low-frequency components of the speech wave were attenuated, and that 
the attenuation occurred before the wave was subjected to infinite clipping. 
Severe peak clipping appears to be less deleterious if the low-frequency 
components are suppressed than it is if they are allowed to remain in the 
wave upon which the clipper acts.§ 

For the articulation-tests with normal speech, the records were played 
back with just enough amplification to make the peak sound pressure of 
the average word equal to 85 db. above the reference level, 0.0002 dyne/ 
cm?. For the tests with square speech, the records were played back through 
the infinite-clipping device, the output of which was set so that the sound 
pressure (both peak and root-mean-square) was also 85 db. above the 
reference level. These sound-pressure levels were measured in a metal cavity 
of approximately the same volume (6 cc.) as the outer ear. In successive 
tests, normal and square speech were alternated at random, and the white 
noise was introduced, also in random sequence, at five different levels. 
Four of the noise levels were 68, 73, 78, 83, db. above 0.0002 dyne/cm.?, 
rms. The corresponding peak pressures are 80, 85, 90, and 95 db.®° The 


“RCA recorder, Type MI-4915-AX, with cutter, Type MI-11850; 12-in. disks; 
78 tpm. 

* Gross and Licklider, op. cit., 20-28. 

* Since the instantaneous amplitudes (pressures) of white noise are distributed 
normally, the term ‘peak’ must be defined arbitrarily. Here it was determined by 
viewing the electrical waves of the noise on the face of a cathode-ray tube and 
noting the amplitude of the highest peaks recurring once or twice a second. The 
‘peak’ as thus defined was approximately four times the root-mean-square amplitude, 
i.e. four standard-deviation units from the mean in the distribution of instantaneous 
amplitudes. 
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fifth noise level was the lowest which could be obtained with the apparatus 
and the room used in the tests—a weak combination of amplifier hum and 
room noise which was entirely negligible so far as its influence upon the 
articulation scores was concerned. 

The results of the tests are shown in Fig. 5. The articulation percentages 
are plotted against the ratio of the peak amplitudes of the speech and the 
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. Fig. 5. EFFECT OF NoisE UPON INTELLIGIBILITY 


The intelligibility of undistorted speech is reduced measurably even by faint noise, 

whereas the noise has to be relatively strong to affect the intelligibility of infinitely 

clipped speech. Moreover, when the comparison is drawn (as it is here) between 

square speech and normal speech equated in terms of peak amplitude, square 
speech proves to be the more intelligible in intense noise. 


noise. The results of the present experiment are indicated by circles, each 
circle representing the mean of 30 test-scores for each of two listeners. The 
squares, based on 4 to 8 tests with 10 listeners, are from an earlier experi- 
ment with white noise.1° The curves confirm the expectation that square 
speech would be more resistant than normal speech to masking by white 
noise. Fig. 5 shows in fact that, when normal speech and infinitely clipped 


” Jj. P. Egan, J. Miller, M. I. Stein, G. G. Thompson and T. H. Waterman, 
Studies of the effect of noise on speech communication, Psycho-Acoustic Laboratory 
Report OSRD No. 2038 (PB 22907), 1943. 
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speech, equated in peak-to-peak amplitude, are heard against a background 
of white noise, the normal speech is the mote intelligible if the speech- 
to-noise ratio is high, but the clipped speech is the more intelligible if the 
speech-to-noise ratio is low. 

For purposes of discussion, it is convenient first to consider the some- 
what startling result that square speech was sometimes more intelligible 
than normal speech. After discussing that result, we can more easily in- 
terpret the observation that the noise level had less effect upon the articula- 
tion-scores for square speech than upon the articulation-scores for normal 
speech. The immediate question is, therefore: what characteristic of square 
speech gives it an advantage over normal speech at low speech-to-noise 
ratios? This characteristic would appear to be simply the squareness of the 
wave-form. The square wave-form is optimal for packing a large amount 
of power into a small space. The jagged, irregular wave-form of normal 
speech, on the other hand, is in that respect very inefficient. When the 
two waves ate equated in peak amplitude, the square speech-wave has in 
fact about sixteen times as much power as the normal speech-wave. Fur- 
thermore, this power is distributed equitably among the consonants and the 
vowels, whereas in normal speech almost all the power is in the vowels 
where it is not needed, and the consonants—which are highly important 
for intelligibility—are relatively weak and easily masked by noise. 

This marked advantage in power is very important in the presence of 
intense noise: a signal cannot be intelligible if it is not audible, and power 
is the principal factor determining audibility. Actually, if the abscissa in 
Fig. 5 were changed from the ratio of the peak amplitudes of the speech 
and the noise to the ratio of the root-mean-square amplitudes, the curve 
for normal speech would remain unchanged and the curve for square speech 
(now deprived of its power advantage) would be shifted 12 db. to the 
right.11 Infinitely clipped speech would then be represented as always less 
intelligible than normal speech, but the difference in intelligibility would 
be much greater at high than at low speech-to-noise ratios. 

The advantage that accrues to infinitely clipped speech because of its 
square wave-form decreases as the speech is made progressively stronger 
relative to the noise. By the time the peak speech-to-noise ratio is about 
3, db., the power advantage of square speech is balanced by the deleteriou: 
effects of distortion (see Fig. 5). At higher speech-to-noise ratios, normal 
speech is more intelligible than square speech. Normal speech continues 


11The ratios of peak amplitude to root-mean-square amplitude for normal speech, 
square speech, and white noise are, respectively, 12 db., 0 db., and 12 db. 
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to become more and more intelligible as the ratio increases because more 
and more of the weak consonant sounds pass the masked threshold. Infi- 
nitely clipped speech, on the other hand, reaches its ceiling of intelligibility 
quickly, probably because all the phonemes of square speech are of the 
same intensity, and therefore exceed the masked threshold almost simul- 
taneously. 

Thus it would appear that the resistance of square speech to masking by 
noise is due: (1) to the fact that the square wave-form is the best possible 
wave-form for transmitting power over a system of limited amplitude- 
handling capability; and (2) to the fact that severe distortion is inherent in 
infinite clipping. The one factor tends to make square speech audible at 
low peak speech-to-noise ratios; the other factor tends to limit the intelligi- 
bility even in the absence of noise. 


LEARNING TO UNDERSTAND SQUARE SPEECH 


The articulation-tests described in connection with the discussion of the 
effects of noise had been planned with a second aim: to find out how much 
improvement would occur, as a result of continued practice, in the listeners’ 
understanding of square speech. The two listeners (DB and IP) had 
never before served as Ss in articulation-tests, They took twenty 50-word 
tests a day for 30 days. Each day there were 10 tests with undistorted speech 
and 10 tests with infinitely clipped speech. Each set of 10 tests included 
2 tests at each of the 5 speech-to-noise ratios. 

Inasmuch as the same recorded word-lists were used over and over for 
10 successive test-days, there was no chance for changes in the performance 
of the talker to have a systematic effect upon the articulation-scores. Improve- 
ment in the scores as the experiment progressed could therefore be identified 
as learning on the part of the listeners. The listeners were allowed no other 
knowledge of their results than what they could gain by hearing the word- 
lists repeated in successive test-sessions. 

New copies of the recorded word-lists were introduced at the end of the 
tenth day so that the effect of record wear could be determined, and new 
recordings of new word-lists were introduced at the end of the twentieth 
day so that the amount of transfer to a new vocabulary could be measured. 
between the seventh and eighth test sessions. 

“In order to make sure that the characteristics of the records would be uniform, 
we used copies of master recordings (all the copies cut with the same equipment) 
in each part of the experiment. The total recorded vocabulary of 1000 words read 


by two talkers was divided into two parts by assigning half of the records of each 
talker to one part, half to the other, with the aid of random numbers. 
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The effect of record-wear and the degree of transfer to the new vocabulary 
are shown in Fig. 6, which is based on articulation-scores averaged without 
regard to speech-to-noise ratio or to listener. Inasmuch as there is no 
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Fic. 6. RECORD WEAR AND SUBSTITUTION OF NEw TEST-VOCABULARY AS FACTORS 
INFLUENCING THE COMPARISON OF UNDISTORTED SPEECH AND SQUARE SPEECH 


When at the end of the tenth test-session new copies of the recorded word-lists 

were introduced, there was little or no effect upon the articulation-scores; but the 

substitution of a mew vocabulary of test-words at the end of the twentieth test- 
session made the scores fall markedly. 


abrupt change in score-level between the tenth and eleventh days, it is 
evident that record-wear was not an important factor. Between the twentieth 
and twenty-first days, however, the scores fell sharply, showing that part 
of what the listeners had learned during the 20 days of practice was specific 
to the particular words to which they had been listening. In the case of the 
infinitely clipped speech, approximately two-thirds of the learning carried 
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over to the new vocabulary; in the case of the unclipped speech, approxi- 
mately one-half. 

It appears from Fig. 6 that the listeners improved considerably more 
in understanding infinitely clipped speech than they did in understanding 
normal speech. Although this was actually the fact, it is best not to conclude 
that square speech is inherently more susceptible to learning than normal 
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Fic. 7. LEARNING CURVES FOR CLIPPED AND UNCLIPPED SPEECH 


These curves are similar to those of Fig. 6, except that the results for the various 
speech-to-noise ratios are shown separately and the scores for each speech-to-noise 
ratio are averaged over 5-day intervals. The uppermost and the lowermost curves 
of the right-hand graph show that the listeners made little or mo improvement with 
practice when they heard normal speech either in quiet (S/N == ©) or in intense 
noise (S/N = —10db.). The remainder of the curves show considerable amounts of 
learning and of transfer from one vocabulary of test-words to the other. 


speech without first examining learning curves for the several speech-to- 
noise ratios separately. These are shown in Fig. 7, in which the data for 
5-day periods are averaged. Fig. 7 suggests that most of the difference be- 
tween the two curves of Fig. 6 is accounted for by two of the series with 
unclipped speech, the one in quiet (S/N = oc) and the one in extreme 
noise (S/N = —10 db.). There was practically no learning of normal 
speech under either of these conditions. With the other speech-to-noise 
ratios, however, the listeners improved almost as much in understanding 
normal speech in noise as in understanding square speech in noise. 
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STATISTICAL ANALYSIS OF THE ARTICULATION DATA 


The principal relations shown in Figs. 5 and 6, and probably also those shown 
in Fig. 7, are sufficiently clear-cut that statistical tests of their significance are 
largely superfluous. Nevertheless, it is of some interest to have an estimate of the 
precision of the articulation-measurements and to consider the question, to what 
extent is it reasonable to generalize from the present results obtained with only 
two talkers and two listeners? 

As a practical matter, the best estimate of the reliability of the results probably 
is an estimate based in part upon past experience with articulation-tests and in part 
upon statistical analysis of the data of the present experiment. On this basis, we 
would consider it fair agreement if a repetition de novo of the comparison of square 
speech and normal speech should yield curves which are consistently 5% higher or 
lower than those shown in Figs. 5, 6, and 7. The range of uncertainty indicated by this 
statement of expectation makes rough allowance for minor inaccuracies in the 
calibrations of meters and for systematic differences among experimenters in de- 
termining the peak amplitudes of the test-words, in setting up the experimental 
conditions, etc. It also takes into consideration the skill of the talkers and listeners. 

If no attempt is made to take into account sources of error which may have 
operated in a constant way throughout the experiment, /.e. if judgment is based 
solely upon the internal consistency of the results, the estimate of error which 
results is much smaller than the more inclusive estimate made in the preceding 
paragraph. The standard error of the individual test-scores (the square root of the 
error variance in an analysis of the variance in the experiment) is, in fact, less than 
9 percentage units. Since each circle in Fig. 5 represents the mean of 120 such scores, 
the datum points in that figure may be considered reliable, for the particular sub- 
jects used, within + 1 percentage unit. For the points in Figs. 6 and 7 (each 
based on 20 scores), the estimated standard error is less than 2 percentage units. 

The analysis of variance, from which the estimates just described were derived, 
is shown in Table I. As indicated in the table, the total variation within the matrix 
of the 1200 test-scores was analyzed into: (a) components assignable to the six 
experimental variables, (b) components assignable to the first-order interactions, and 
(c) an unassigned remainder. The column headed x gives the number of degrees 
of freedom associated with each source of variation. F is the ratio of the mean 
square variation associated with each source to the remainder variance. (Actually, 
two remainder variances were computed, one including all the interactions, the 
other including only second- and higher-order interactions. The six experimental 
variables were tested against the former; the first-order interactions were tested 
against the latter.) P, in the table, is the probability that the fluctuations of random 
sampling, alone, would give rise to an F-ratio as great as, or greater than, the 
ratio actually obtained. 

In making the analysis of variance summarized in the table, it was found that, 
with the data in the form of percentage scores, the assumption of homogeneity of 
variance could not be made. To minimize the heterogeneity of variance, a ‘ho- 
mogenizing’ transformation was used. For each of the 1200 test-scores (A = per- 
centage of words correct out of 50), a transformed score (1) was obtained with 
the aid of tables of the equation, 
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I= 50 sin? (A/50—1), 
and the transformed scores were used in the analysis.14 The units of the trans- 
formed scale are equal to the units of the original percentage scale for articulation- 
scores near 50%. For scores near 0 or 100%, however, the units of the transformed | 
scale are smaller than those of the percentage scale. Therefore, in taking the 
square root of the remainder variance, which is actually in the units of the trans- 


TABLE I 


ANALYSIS OF THE VARIANCE IN THE ExPERIMENT COMPARING NorMAL 
SPEECH AND SQUARE SPEECH 


Mean square 


Source of variation n variation F P 
Clipping (C) I 14,786 52.1 <.01 
Speech-to-Noise Ratio (SNR) 4 123,075 433. <.0O1 
Talker (T) I 13,778 48.5 <.O1 
Listener (L) I 105,953 3°73. <.01 
Block (B) 2 14,601 51.4 <.01 
Session (S) 9 2,285 8.05 <.O1 
Remainder including all 

interactions 1181 284 
CXSNR 4 58, 428 868 <.O1 
CXT I 951 14.1 <.O1 
CXL I 2,513 37.3 <.o1 
CXB 2 3,097 46.0 <.O1 
Cxs 9 228 3.39 <.O1 
SNR XT 4 353 5.25 <.O1 
SNRXL 4 284 4.22 <.01 
SNR XB 8 293 4.35 <.O1 
SNRXS 36 99.4 1.48 <.05 
TXL I Q1.5 1.36 = 
TXB 2 519 7.71 <.O1 
TXS 9 35-9 0.533 _ 
LXB 2 1,161 17.3 <.O1 
Lxs$ 9 14.5 L.11 — 
BXS$ 18 250 3.71 <.O1 
Remainder including higher- 

order interactions 1071 67.3 


formed scale, to be equal to the standard error of an individual test score in per- 
centage units, we are making a conservative approximation. The value obtained in 
that way is an unbiased estimate of the standard error of measurement, in per- 
centage units, in the range near 50% articulation, but in all other parts of the 
scale, especially near either end of the scale, the estimate is too high. 

The parts of Table I that refer to the talkers and the listeners are of interest in 
connection with the question, to what extent can the results of the experiment be 
generalized to other communicators? The talker and listener variances are clearly 
quite large. The listener variance is in fact almost as large as the variance associ- 


™ The tables used were a modified form of Bliss’ tables for the arc sine trans- 
formation in G. W. Snedecor, Statistical Methods, 1940, 382-383. 
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ated with the variable, speech-to-noise ratio. This would indicate, of course, that 
the general level of the articulation scores is highly dependent upon the personnel 
selected for the experiment. Actually, the Ss were selected on the theory that, when 
it is not feasible to draw a random sample, diversity is preferable to homogeneity if 
the diversity can be achieved without disturbing the central tendency of the 
results. One of the talkers had an excellent, trained voice; the other's voice was 
somewhat ‘thin’ and rather high in pitch. One of the listener’s was above average in 
listening ability whereas the other had more than the usual difficulty in making out 
the test-words. 

The important point, however, is not the general level of the articulation-scores, 
but the effect, at each of the five speech-to-noise ratios, of infinite clipping upon the 
scores. In this connection, it is important that the interaction between the variables, 
clipping and speech-to-noise ratio, is much larger than any of the interactions in- 
volving talkers or listeners. Thus the fact that the effect of infinite peak clipping 
is different for different speech-to-noise ratios (/.e. that the two curves of Fig. 5 
have greatly different slopes) would appear not to be highly dependent upon the 
selection of talkers or listeners. 

Finally, it may be of interest to note that the interaction between the two 
variables, clipping and listener, is strong. The effect of infinite peak clipping upon 
intelligibility was considerably greater for the poorer listener than it was for the 
better listener. This observation—considered in the light of the fact that the inter- 
action between the variables, speech-to-noise ratio and listener, was not particularly 
marked—suggests that in studying individual differences among listeners it may be 
of particular interest to work with highly distorted speech. 


DIMENSIONS OF SPEECH AS A STIMULUS 


One cannot experiment very long with square speech without being im- 
pressed that its intelligibility and its sound are not as different from the 
intelligibility and the sound of normal speech as the gross dissimilarity 
in wave-form would suggest. This impression corroborates a view that has 
long been implicit in auditory theory. The wave-form provides a unique 
description of speech as an acoustic signal, but it is probably not the most 
meaningful description of speech as a stimulus. The dimensions of wave- 
form, amplitude and time, are not, as such, the ones in terms of which 
the auditory system responds. If we look for another pair of orthogonal 
dimensions in the acoustic signal, we find amplitude and frequency, but 
this breakdown is not the right one either, because the auditory system 
obviously analyzes along the temporal dimension, hearing “one,” ‘‘two,” 
“three,” and “four” as separate words, not as a steady spectrum. 

Thus the auditory system distinguishes amplitude (or intensity), fre- 
quency, and time. Frequency and time are so related, however, that exact- 
mess in the specification of one can be achieved only at the expense of 
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inexactness in the specification of the other. The auditory system must 
compromise between them and resolve imperfectly in both rather than 
perfectly in either alone. The appropriate representation of speech, there- 
fore, is a pattern in amplitude, frequency, and time which recognizes a 
degree of uncertainty in the simultaneous specification of the latter two 
variables. ‘“Visible speech,” as developed by the Bell Telephone Labora- 
tories, is such a pattern.*® 

Instead of asking, therefore, why infinitely clipped speech is not as 
unintelligible as its wave-form would suggest, it is probably better to 
compare an intensity-frequency-time pattern of infinitely clipped speech 
with a corresponding pattern of normal speech. Such patterns, together with 
a pattern of speech subjected to a form of distortion called ‘center clip- 
ping,’ are shown in Fig. 8. 

The word represented in each of the three diagrams of Fig. 8 is shoe- 
bench. In the uppermost diagram, it is shown as it appears in the absence 
of distortion. In the middle diagram, it has been subjected to infinite peak 
clipping. In the lowermost diagram, it has been subjected to center clipping. 
Center clipping is in a sense the opposite of peak clipping: it removes the 
part of the speech-wave nearest the center axis and leaves only the peaks. 
In this instance, the center one-half of the wave was eliminated, viz. 6-db. 
center clipping. 

In each of the diagrams, intensity is represented as elevation above the 
upper surface’ of the block, z.e. above the time-frequency plane. Time 
runs horizontally across the diagram, the several component sounds of 
shoe-bench occurring in sequence from left to right. The frequency scale 
is linear, starting with O cycles per second at the near edge of the block 
and reaching 4000 cycles per second at the far edge. 

The procedure used in preparing the diagrams, parallelled one of the procedures 
recently described by Koenig, Dunn and Lacy.” The word shoe-bench was recorded on 
a phonograph disk, using a single circular groove instead of the usual spiral. Shoe- 
bench was then played back repeatedly, first through a linear circuit or through one 
of the clippers, then through a narrow-band filter (ERPI Type RA-277-F, set for 65- 
cycle bandwidth), the center frequency of which was moved slowly along the fre- 
quency-scale. The output of the filter was then rectified, and the envelope of the 


rectified wave was photographed from the face of a cathode-ray tube. Each time shoe- 
bench was repeated, one of the irregular horizontal lines seen in the diagrams was 


* W. Koenig and H. K. Dunn and L. Y. Lacy, The sound spectrograph, J. Acoust. 
Soc. Amer., 18, 1946, 19-49; G. A. Kopp and H. C. Green, Basic eee principles of 
visible speech, ibid., 18, 1946, 74-89; J. C. Steinberg and N. . French, The por- 
trayal of visible speech, 18, 1946, 4-18. 

* Koenig, Dunn and Lacy, op. cit., 22 f. 
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Fic. 8. INTENSITY-FREQUENCY-TIME PATTERNS CONTRASTING THE EFFECTS OF 
INFINITE PEAK CLIPPING AND CENTER CLIPPING 


The word represented in each diagram is shoe-bench. The first diagram shows the 

normal topography of the word. The second and third diagrams show the pattern 

after the speech-wave has been subjected, respectively, to infinite peak clipping and 

to 6-db. center clipping. Note that the gross pattern of the word is much less 

severely disrupted by infinite peak clipping than by center clipping, as is evidenced 

by the greater similarity between the upper and middle diagrams than between 
the upper and lower, 
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recorded on the film. Finally, the photographs were traced in the sequence in which 
they had been made, with the successive base-lines offset enough to separate the trac- 
ings in the frequency dimension. 

In the photographic traces obtained with infinite peak clipping, the segments 
corresponding to the intervals between words were quite irregular because of the 
‘square noise’ described in the section on Infinite Peak Clipping. The irregularities, 
which would have appeared as peaks disposed at random along the left-hand and 
right-hand edges of the time-frequency plane, were not reproduced in preparing 
the diagram illustrating infinite peak clipping. 


As seen in the uppermost diagram, the main features of the undistorted 
word shoe-bench are these: (1) The sh sound consists almost entirely of 
high-frequency energy. (2) It blends into the 00, in which the low- 
frequency mountain overshadows two groups of foothills at higher fre- 
quencies. (3) The 4, which does not appear as a separate feature in the 
pattern, is actually little more than the mode of onset of the e4 sound. 
(4) The e/ has three easily distinguishable concentrations of energy. (5) 
The eh blends into the 2, which consists principally of low-frequency 
components. (6) The ch, with energy spread throughout the high-fre- 
quency end of its area, is somewhat similar in appearance to the 5A, as in- 
deed it should be since it is quite similar in sound. 

Comparison of the middle diagram with the uppermost diagram reveals 
that, although many of the details of the pattern are changed by infinite 
peak clipping, the general plan of the terrain is by no means rendered 
unrecognizable. The main concentrations of low-frequency and of high- 
frequency energy are still in the same places despite the rearrangement of 
minor peaks. 

In the lowermost diagram (center clipping), all trace of the general pat- 
tern has been lost, and what is left is quite unrecognizable. Actually it 
sounds more like two bursts of static than like a word. Correlation of the 
three diagrams with observations on intelligibility and quality suggests that 
the gross pattern formed by the main concentrations of energy in the 
frequency-time plane is the basic determiner of intelligibility, and that the 
details of topography are important principally in influencing the timbre 
and other qualitative characteristics of the speech. 


SUMMARY AND CONCLUSIONS 


In experiments on distortion in voice communication circuits, it was 
discovered that speech can be converted into a series of square or rec- 
tangular oscillations, without rendering it unintelligible. The conversion 
of normal speech into ‘square speech’ was accomplished by clipping off 
both the upward and the downward peaks of the normal speech-wave, 
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reamplifying the remainder, clipping again and reamplifying again, until 
there was nothing left but a series of rectangular waves, resembling the 
marks and spaces of telegraphic code. In articulation-tests over circuits 
which introduced this extreme distortion (infinite peak clipping), more 
than half of the difficult, monosyllabic words in the Psycho-Acoustic Labo- 
ratory PB word-lists were heard correctly. Conversation in square speech 
involved very little uncertainty or misunderstanding. Only an occasional 
statement had to be repeated. 

Further experiments showed that, when equated in peak amplitude with 
normal speech-waves and heard in the presence of intense noise, the square 
waves of infinitely clipped speech are even more audible and more intellig- 
ible than the irregular waves of normal speech. The superiority of square 
speech under these conditions is due to the fact that its square wave-form 
cafries more power per unit peak amplitude than does the irregular wave- 
form of normal speech. In quiet, on the other hand, square speech is, of 
course, less intelligible than normal speech, because the severe distortion 
involved in infinite peak clipping modifies many of the characteristic fea- 
tures which make normal speech sound as it does. In addition, the harmonics 
and combination frequencies introduced by infinite clipping tend to mask 
the intelligible residue. The essential fact, however, is that, despite the 
severe distortion, square speech is surprisingly intelligible. This fact indi- 
cates that the ‘dynamic’ aspects of the speech-signal, e.g. the pattern of the 
relative intensities of the fundamental speech-sounds, are not essential for 
intelligibility. 

Infinite clipping produces not a new language that must be studied to 
be learned, but a new dialect, understandable even at first hearing. Everyone 
who has heard it agrees that square speech is immediately intelligible; that 
neither great concentration nor practice is necessary for its understanding— 
but both help. During the course of the experiment, the listeners improved 
considerably in their ability to record the words correctly. 

The fact that infinite peak clipping impairs intelligibility no more than 
it does appears reasonable when the distortion of the speech-signal is 
analyzed in terms of three-dimensional speech-patterns. Whereas the wave- 
form of speech (/.e. the representation of speech in terms of the two 
dimensions, amplitude and time, alone) is disfigured beyond recognition by 
infinite clipping, only the details of the amplitude-frequency-time pattern 
(see Fig. 8) are modified. The main concentrations of speech-energy on 
the frequency-time plane evidently are the features of the acoustic stimulus 
on the basis of which the listener's perceptual system recognizes words and 
phrases. 
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